Self-organization of primitive metabolic cycles due to non-reciprocal interactions

One of the greatest mysteries concerning the origin of life is how it has emerged so quickly after the formation of the earth. In particular, it is not understood how metabolic cycles, which power the non-equilibrium activity of cells, have come into existence in the first instances. While it is generally expected that non-equilibrium conditions would have been necessary for the formation of primitive metabolic structures, the focus has so far been on externally imposed non-equilibrium conditions, such as temperature or proton gradients. Here, we propose an alternative paradigm in which naturally occurring non-reciprocal interactions between catalysts that can partner together in a cyclic reaction lead to their recruitment into self-organized functional structures. We uncover different classes of self-organized cycles that form through exponentially rapid coarsening processes, depending on the parity of the cycle and the nature of the interaction motifs, which are all generic but have readily tuneable features.

Since Oparin 1 proposed a picture to describe how early forms of living matter might have emerged from what Haldane described as the prebiotic soup 2 , there has been a significant amount of progress in our understanding of the physical aspects of the origin of life 3 . Recent examples of such studies include spontaneous emergence of catalytic cycles 4,5 , spontaneous growth and division of chemically active droplets [6][7][8][9] , programmable self-organization of functional structures under non-equilibrium conditions 10,11 , and controllable realization of metabolically active condensates 12 . A striking generic observation that has emerged in a variety of different scenarios is that the introduction of non-equilibrium activity in the form of catalytic activity, or a primitive form of metabolism, can be a versatile driving force for functional structure formation [13][14][15][16][17][18] with manifestations of lifelike behaviour [19][20][21][22][23][24][25][26][27] . It has also been demonstrated that the structured catalytic activity that would support the required non-equilibrium processes for primitive cells can be successfully coupled with the condensation of appropriate functional nucleotide and peptide components in membrane-free systems [28][29][30] , as well as lipid components in protocells with functionalized membranes 31,32 .
Living systems necessarily involve a set of auto-catalytic chemical reactions 33 , which have been theoretically shown to spontaneously emerge in a population of polypeptide-like structures that could assemble in a primordial soup setting [34][35][36][37] . A candidate metabolic cycle that may have served a key role in the early stages of life formation is the citric acid cycle, which consists of 11 members and exhibits evolutionary robustness and universality 38,39 . Candidates for pre-RNA and protein autocatalytic chemical networks have been identified from early microbial organisms 40 , and mixtures of RNA fragments have been experimentally observed to organize into self-replicating and catalyzing reaction networks [41][42][43][44] .
The physicochemically motivated ideas initiated by Oparin and Haldane were critically debated for much of the past century by proponents of the perspective that (genetic) information should be considered as the main organizer of matter that forms life 33,45,46 . As a modern interpretation of these considerations, we note that the currently accepted paradigm assumes that the ingredients that would later join up to form intricate components of living systems first come together by ad hoc physical forces without any input from the information that will eventually be at work in their hierarchical selforganization. The information-based organization is expected to occur when the system has already made physical condensates. However, this paradigm has so far been unable to answer two important questions. First, polymeric condensates such as coacervates are known to be intrinsically very slow, almost glass-like, in their dynamics, even if they are driven by external non-equilibrium forces like temperature and proton gradients. Therefore, it is not clear how such dense and glassy condensates that were formed randomly would have been able to efficiently evolve to form information-based functional structures via random searches given the time scale that has taken life to emerge after the formation of the earth. Secondly, the physics of condensation is governed by relatively slow power law coarsening dynamics such as the Lifshitz-Slyozov (~t 1/3 ) law 47 , even in externally driven nonequilibrium cases. Then, it is unclear from a physical point of view how life has emerged through slow coarsening into inherently slow condensates.
In connection with the above considerations and to broaden the scope of the research on the physical aspects of the origin of life, we pose the following question: how can we envisage pathways in which the information contained in chemical reaction networks from which primitive forms of metabolism can emerge would lead to structural self-organization of the corresponding components? Here, we propose a strategy that can achieve this task by employing the naturally occurring non-reciprocal interactions between catalysts that can form a cyclic reaction network. We show that model catalyticallyactive particles participating in a metabolic cycle are able to spontaneously self-organize into condensates, which may aggregate or separate depending on the number of particle species involved in the cycle, and exhibit chasing, periodic aggregation and dispersal, as well as self-stirring, thus providing a generic mechanism for spontaneous formation of metabolically-active protocells. While the observed (super-)exponential coarsening law offers a significantly faster alternative for the formation of condensates (see Fig. 1), the information-driven dynamics leads to formation of structurally active and functional condensates that exhibit lifelike behaviour already at the outset.

Results
Non-reciprocal interactions have been shown to generically emerge in active matter in the context of non-equilibrium phoretic (chemotactic) interactions 48 . Chemotaxis in response to chemical gradients has been experimentally observed not only in the context of synthetic microscale colloids, but also for biological enzymes 49,50 , molecular catalysts 51 , and nucleic acids 52,53 . The latter two observations raise the interesting prospect that chemotactic interactions may have played a role in the assembly of prebiotic systems, e.g. in a RNA world scenario. Let us consider a set of M species of chemically-active particles (Fig. 1a, top), representing catalyst molecules or enzymes. Each of the particles converts a substrate (s) into a product (p) at a rate α. At steady state, they create perturbations in the concentration field of the corresponding substrate that decays with distance r as δc (s) ∝ − α/r, and a corresponding change in the concentration of the corresponding product as δc (p) ∝ α/r (Methods). These particles are also chemotactic (Fig. 1a, bottom): when subjected to a concentration gradient of their substrate, they develop a velocity v ∝ − μ (s) ∇ c (s) with μ (s) the chemotactic mobility for the substrate, which is negative or positive if the particle is attracted to or repelled from the substrate, respectively. The particles convert substrate (s) into product (p) with a rate given by the activity α (top) and respond to gradients of these chemicals with mobilities μ (s) and μ (p) (bottom). b M active particle species are arranged in a model metabolic cycle, in which the product of species m is the substrate of species m + 1. c Non-reciprocal interactions between particles of the same (v m,m ) or adjacent (v m,m±1 ) species. Direction and colour of arrows indicate the attractive (blue, inwards arrowhead) or repulsive (red, outwards arrowhead) nature of the interaction.r is the unit vector pointing from particle m to the other particle. d Phase diagram of interaction motifs. Each region constrains the mobilities so that one interaction has a higher magnitude than the others, as highlighted by a full arrowhead. The grey asterisk indicates the location in parameter space of the interactions pictured in (c). The green line separates the self-attracting and self-repelling regions. The sign triplets correspond to the signs of (μ (s) , μ (p) , μ (p) − μ (s) ). e, Cluster growth dynamics for a cycle of M = 5 (blue, see Similarly, the particles are able to chemotax in response to gradients of their products, with a mobility μ (p) .
To create a model for primitive metabolism, we consider a simplified metabolic cycle (Fig. 1b), in which the substrate of the catalyst species m, which we denote as chemical (m), is the product of species m − 1. To close the cycle, species 1 has the product of species M as its substrate. For simplicity, we take all catalyst species to have the same parameters α, μ (s) and μ (p) , and to be present in the system at identical initial concentrations. This assumption will not limit the range of validity of the predictions, as more general choices for the parameters can be shown to lead to a number of distinct classes with relatively sizeable regions of the parameter space for each behaviour 54,55 . The cycle can achieve a steady state without net chemical production or consumption. Due to their chemical activity and chemotactic mobilities, the particle species can interact with one another through chemical fields (Fig. 1c). For instance, if we consider two particles of species m and m − 1, then the particle of species m − 1 creates, through its chemical activity, a concentration gradient of the substrate of the particle of species m, to which the latter responds by developing a velocity directed towards the particle of species m À 1, v m,mÀ1 / αμ ðsÞr , wherer is the unit vector pointing from the particle that creates the perturbation to the particle that responds to the perturbation (Methods). On the other hand, the particle of species m consumes the product of m − 1, and thus the particle of species m − 1 develops a velocity v mÀ1,m / Àαμ ðpÞr towards the other particle. As a consequence, the interactions between the particles of species m and m − 1 are nonreciprocal, i.e. v m,m−1 ≠ − v m−1,m (see Fig. 1d for different possibilities). This effective violation of action-reaction symmetry is a signature of non-equilibrium activity, leading to nontrivial many-body behaviour as has been shown for chemically-active particles interacting through a single chemical 25,56 , active mixtures interacting through generic short-range interactions 57,58 , complex plasmas 59 , and other systems 60,61 . Particles of the same species also selfinteract by consumption of their substrate and creation of their product, with a velocity v m,m / αðμ ðpÞ À μ ðsÞ Þr. We note that these effective nonreciprocal interactions mediated by chemical fields are long-ranged, with the induced velocities going as 1/r 2 (Methods).
We consider the evolution equations for the concentration fields of the active species ρ m and their substrates c (m) , given by the coupled system of 2M equations ∂ t ρ m ðr, tÞ = ∇ Á ½D p ∇ρ m + ðμ ðsÞ ∇c ðmÞ + μ ðpÞ ∇c ðm + 1Þ Þρ m , ð1Þ Equation (1) describes the conserved dynamics of the catalysts, with a diffusion term involving a species-independent coefficient D p and a chemotactic drift term in response to both substrate and product gradients. The substrate concentrations evolve according to the reaction-diffusion Eq. (2), with a diffusion coefficient D, and a reaction term corresponding to the activity of the catalysts.
The time evolution of Eqs. (1) and (2) naturally leads to the formation of clusters, akin to active phase separation 25,56 . The clusters are formed through a particularly fast and efficient coarsening process that exhibits exponential growth rather than the commonly occurring power law form, associated with processes such as Ostwald ripening, as can be seen in Fig. 1e (see Methods). This behaviour can be characterized using a simple scaling argument. When particles are collapsing onto a cluster, the rate of growth for the cluster can be estimated as dN dt = H S ρv Á dS where the velocity v = − μ ∇ c can be expressed in terms of the particle concentration by using Gauss theorem and the relation − ∇ 2 c = αρ/D, which yields dN dt = μα D ρN. This expression can be integrated to obtain NðtÞ = N 0 exp μα predicts an exponential growth law for constant ρ and allows for superexponential growth if the density increases with time, which matches well with the results presented in Fig. 1e. This observation suggests that non-equilibrium phoretic interactions have the ability to guide formation of dense clusters in a fast and efficient manner, and as such, can be strong candidates for creating the appropriate conditions for the emergence of early functionalized protocells.
A linear stability analysis on Eqs. (1) and (2) (Methods) around a spatially-homogeneous solution leads to the following eigenvalue equation for the macroscopic (long-wavelength) particle density modes The matrix Λ m,n describes the velocity response of species m to species n, and is defined as follows where ρ 0 represents the initial homogeneous concentrations. By definition, Λ m,n is negative, or positive, if m is attracted to, or repelled from, n, respectively. The form of Λ m,n suggests a classification scheme as there are six possible interaction motifs (Fig. 1d), representing the interactions of each species with itself as well as with its two neighbours in the metabolic cycle. The signs of the interactions are represented diagrammatically, following the conventions defined in Fig. 1c and d. The eigenvalues λ ℓ (ℓ ∈ {1, …, M}) allow us to predict the stability of the system: ReðλÞ > 0 for any eigenvalue λ indicates an instability, whereas ReðλÞ < 0 for all eigenvalues implies a stable homogeneous state. The eigenvector δρ ' m , in turn, gives the stoichiometry at the onset of instability, i.e. the ratio of the different species within the growing perturbation, which may be positive, for species that aggregate together, or negative, for species that separate.
The topology of the metabolic cycle strongly influences its selforganization. As a point of comparison, we consider a non-cyclic system, in which M catalytic species all act on a single chemical field. In this case, all the coefficients of the interaction matrix are equal to αμρ 0 , leading to a system with only one nonzero eigenvalue λ = − Mαμρ 0 /D. The corresponding instability condition is αμ < 0, and the instability is equivalent to that of the Keller-Segel model 25,62 . The model metabolic cycle studied here, however, presents a different category. As the interaction matrix (4) is a circulant matrix, its eigenvalues are easily calculated as (see Supplementary Note 1 for graphical representations of the eigenvalue spectra for different species numbers). There are now M − 1 nonzero eigenvalues, which come as pairs of complex conjugate numbers with the possible exception of λ M/2 for M even. In stark contrast with the non-cyclic system, the complex character of these eigenvalues opens the door to oscillatory behaviour. The instability condition, obtained by imposing that the real part of at least one eigenvalue is larger than zero, in turn corresponds to i.e. the catalytic species have to be self-attracting for an instability to occur. This is represented in the phase diagrams of Fig. 2 Fig. 3b, Supplementary Movie 2) which exchange particles without growing, as found in particle-based Brownian dynamics simulations of the same system (Methods). Remarkably, we find key differences between cycles with even or odd number of species. In the case of an even species number M = 2K, the eigenvalue with largest real part (which dominates the instability) is real and given by implying that the instability is nonoscillatory with the corresponding eigenvector Thus, at onset of instability, all the species with equal parity tend to aggregate together and to separate from the species of opposite parity (Fig. 2a, above the green line). Brownian dynamics simulations show that this prediction carries over to the final phase-separated state; an example is shown in Fig. 3a (Supplementary Movie 3). These simulations show an initial exponential growth of M clusters, each containing all the particles of a given species. The steady state for an even number of self-attracting, cross-repelling species is two large "clusters of clusters", one encompassing clusters of the even-labelled species, the other of the odd-labelled species. Both the transient and the steady state are captured by the growth dynamics shown in Fig. 1e, with the average cluster size initially growing exponentially and saturating at half of the total particle population.
A variety of behaviour is observed in the case with chasing interactions among neighbours, based on the relative values of the chasing strength |μ (s) + μ (p) | as compared to the self-attraction strength |μ (p) − μ (s) |. If both values are of the same order of magnitude, the system behaves similarly to the cross-repelling case, except that the resulting clusters can chase each other or rotate in place (Supplementary Fig. 4a and Supplementary Movie 4). For the cases where the value of the self-attraction is much lower than the chasing strength, fully-hybrid clusters containing all species of the same parity form over longer timescales, as opposed to "clusters of clusters" as in the crossrepelling case (Supplementary Movie 5). Finally, for almost negligible self-attraction, transient oscillations are observed before cluster formation ( Supplementary Fig. 4b, Supplementary Movie 6).
For cycles with an odd number of species M = 2K + 1, the largest real part corresponds to the complex conjugate pair of eigenvalues (see Supplementary Note 1) suggesting the potential for long-lived oscillations, or even oscillatory steady states, with the real part corresponding to the growth rate of the perturbation and the imaginary part to its oscillation frequency. The corresponding eigenvectors δρ K + 1 2 ± 1 2 are also a pair of complex conjugates, with components given by for m = 1, . . . , 2K + 1. The species are out of phase by 2π/(2K + 1) with respect to their second-nearest neighbour during the oscillations. Since the number of species is odd, parity-based cluster aggregation is not possible: if two clusters attempt to come together, a third will systematically come to break them apart. For cross-repelling species, this leads to a segregation into single-species clusters which separate in a way that minimizes their overall repulsion (Fig. 3b, Supplementary Movie 7). Similarly to the even case with M = 2K, this behavior is captured by the growth statistics displayed in Fig. 1e, where mean cluster size exhibits an initial exponential growth and saturates at a value corresponding to the formation of M individual clusters.
In the case of chasing cross-interactions, oscillations become visible when the growth rate is slower than the oscillation frequency, which corresponds to the condition Àμ ðpÞ ≲ À μ ðsÞ 1 + cos π which defines the orange region in Fig. 2b. We note that this inequality only sets an order of magnitude for the transition from oscillatory to non-oscillatory dynamics, rather than a sharp boundary. The behaviour of the system again depends on the relative values of the self attraction magnitude |μ ðpÞ À μ ðsÞ | and the chasing strength |μ ðpÞ + μ ðsÞ |. When self-attraction is weaker than the chasing strength (i.e. close to the instability line), Brownian dynamics simulations indeed show a persistent oscillatory dynamical behaviour with the following choreography for the case in which each species chases after the previous one: a single-species cluster of a species m forms transiently, and is then "invaded" by species m + 1, leading to an explosion that disperses species m back into the solution. Species m then invades a cluster of species m − 1, and so on, in a sequential order until M explosion events have occurred and the cycle starts again. In the case with M = 5 ( Fig. 4 and Supplementary Movie 8; see Supplementary Note 2 for a quantification of the oscillation dynamics), we observe that the system comes back to a state similar to the initial one, except for a swap in the locations of the clusters. This change occurs because the clusters of the second-nearest-neighbour species in the cycle tend to form pairs. One component of one of these pairs is replaced in every explosion event by the species preceding it in the cycle, such that, after five explosions, the pairs have been switched in space. The reverse dynamics (species m invading species m + 1) are observed if the signs of μ (s) and μ (p) are reversed, so that each species chases the next one in the cycle. For even weaker self-attraction or stronger chasing, the clusters do not have time to form. In this case, oscillations are observed in a dilute mixture of catalytic particles, where clusters are replaced by transient zones of higher concentration (Supplementary Movie 9). This can create a self-stirring solution, favouring the mixing and assembly of solution components in time scales considerably shorter than those allowed by passive diffusion. Lastly, if the perturbation growth rate is instead larger than its oscillation frequency (red region in Fig. 2b), then the dynamics leads to formation of stable clusters. We have observed in simulations the formation of chasing hybrid clusters similar to the case with even number of species ( Supplementary Fig. 5, Supplementary Movie 10).
These results can be contrasted with the behaviour of reactiondiffusion systems, which can also undergo instabilities as first formulated by Turing 63 , and have been extensively studied for both interactions between the active species. All species are self-attracting and repel both neighbours in the cycle (corresponding to the top-right quadrants in Fig. 2a, b). Top right: corresponding eigenvalue λ spectrum, in units of − (μ (p) − μ (s) ) αρ 0 /D for the real part and − (μ (s) + μ (p) )αρ 0 /D for the imaginary part. The eigenvalue (or complex conjugate pair) with the largest real part is shown in red. In b, the corresponding conjugate pair has an imaginary part much smaller than its real part, so that the dynamics of the system are non-oscillatory. Bottom: Schematic representations of the time evolution of the aggregation dynamics (left) and corresponding snapshots of molecular dynamics simulations (right, see Supplementary Movies 3 and 7 for even and odd cases, respectively) are shown side by side. Dashed lines in a indicate the parity-based aggregation that occurs for an even number of species.
nonmass-conserving [64][65][66] and mass-conserving 67,68 reactions. Such systems have been shown to exhibit pattern formation, macroscopic phase separation, or travelling wave fronts. In contrast, the model we study in this work is able to exhibit a larger variety of complex behaviour, because of the non-reciprocal interactions. Additionally, each active species is individually conserved in our model, meaning that it is truly the catalysts and not the reactants that self-organize in this case.

Discussion
Our work shows that catalytically-active and chemotactic particles participating in a primitive metabolic cycle exhibit a variety of structural complex collective behaviour. Due to the nature of the gradientmediated interactions involved, such particles are able to interact over large distances, and undergo spontaneous and exponentially rapid cluster formation that serves to support their metabolic function. This feature can help overcome the barrier represented by the time needed for the right types of molecules to meet by chance at sufficiently high concentrations in the first place, and selectively drives the formation of functional metabolic condensates based on the information embedded in the chemical reaction network of the components. This suggests that naturally occurring phoretic transport mechanisms might be able to equip the biological paradigm of liquid-liquid phase separation with an information-controlled strategy for metabolic structure formation. Moreover, since the overall chemical activity of enzymes can be enhanced with suitable clustering behaviour 15,69 , the ability to engineer complex clustering features such as those reported here may help improve the design and efficiency of synthetic reaction networks.
Depending on the parity of the number of different species involved in the cycle and on their chemotactic parameters, these clusters might consist of a single or several species, thereby accommodating a range of design strategies for metabolic structure formation. If the number of species in the cycle is odd, chasing interactions may emerge at the macroscopic level, similar to those that have been observed in recent experiments 26,27 , although in this case leading to long-lived, system-wide oscillations. Our work suggests that a metabolic cycle consisting of odd number of members may have an advantage (over a cycle with an even number of members) due to the formation of the explosive oscillatory stationary state. It remains to be seen whether the fact that the universal citric acid cycle consists of 11 members can in some way be related to this observation. The observed variety of emergent structural behaviour with highly precise control over the composition of the constituents of the metabolically active clusters hints at a significant possible role for catalytically active molecules at the origin of life: the molecules that are metabolically connected to each other will preferentially and efficiently form active clusters together, hence serving as potential candidates for the nucleation of early forms of life.
What is remarkable about our proposal to use non-reciprocal interactions in this context is that such interactions generically emerge in non-equilibrium systems with chemical catalytic activity 49 , which are abundantly present in the cell (molecular catalysts and enzymes involved with metabolic activity) and can be easily synthesized in artificial systems (catalytic colloids, RNA fragments, etc.) for controlled in vitro experiments. In this sense, the theoretical developments that have led to significant progress in the field of active matter in the laboratory setting can now be used to guide new experimental strategies for research in the field of origin of life. Our work offers a theoretical and conceptual platform towards developing this possibility.

Linear stability analysis
We consider a system of M catalytically-active particles described by concentration fields ρ m r, t ð Þ. A given species m converts its substrate, chemical (m) described by a concentration field c ðmÞ r, t ð Þ, into its product, which will in turn be the substrate (m + 1) of the catalytic species m + 1. This conversion takes place at a rate α > 0, which we take to be constant (i.e. catalysis is assumed to take place in the substratesaturated regime), and equal for all species. The particles are also chemotactic for their substrate and their product, with respective mobilities μ (s) and μ (p) , again chosen to be equal for all species. We start from the evolution equations for the substrate and product concentrations given in (2). We then consider the effects of a time-and space-dependent perturbation ðδρ m r, t ð Þ,δc ðmÞ r, t ð ÞÞaround an initially homogeneous state (ρ 0 , c 0 ). We also assume a separation of timescales: as the substrates are typically much smaller than the catalytic particles and thus diffuse faster, we assume that their concentrations equilibrate more quickly to a quasi-steady state for a given configuration of the fields ρ m , meaning that we set ∂ t c (m) ≃ 0. Fouriertransforming the linearized equations with respect to space leads to the following equation for the δc (m) mode with wavevector q: Reintroducing this perturbation into the linearized equation (1), which we Laplace-transform with respect to time, leads to the system of equations for the different modes with growth rate λ and wavevector q: which is an eigenvalue equation. It is readily seen that the fastest growing mode is the q = 0 mode. Therefore, we focus on this mode  Fig. 3b, but now ReðλÞ < ImðλÞ for the most unstable conjugate pair, so that the dynamics of the system are oscillatory.
throughout the paper. The system is unstable when Reðλðq = 0ÞÞ > 0. Denoting the interaction matrix as Λ mn (as defined in Eq. (4)), we obtain the result in Eq. (3).

Pair interactions between spherical catalytically active particles
In order to perform Brownian dynamics simulations of the system, we calculate the effective interaction between two spherical catalytically active particles in the far-field approximation, which we do in two steps. We first consider an isolated particle of species m, with activity α and radius R, taken to be equal for all species. We place the particle at the origin, and use spherical coordinates. The perturbation δc (n) induced by the particle, which is assumed to equilibrate quickly with respect to the motion of all particles, is a solution of the Laplace equation: The corresponding boundary conditions, however, depend on whether the chemical is the substrate (n = m), the product (n = m + 1), or neither. Indeed, the boundary condition is determined by the diffusive fluxes across the particle surface due to its chemical activity, resulting in Now consider a second particle of species n placed at a location r. Its velocity v n,m (r) in response to the perturbation created by the particle of species m will be v n,m ðrÞ = Àμ ðsÞ ∇δc ðnÞ if n = m + 1, Àμ ðpÞ ∇δc ðn + 1Þ if n = m À 1, Àμ ðsÞ ∇δc ðnÞ À μ ðpÞ ∇δc ðn + 1Þ if n = m, 0 otherwise : Using Eq. (16), the responses can be explicitly written as

Brownian dynamics simulations
We perform Brownian dynamics simulations using a custom program written in the Julia language. We simulate N particles equally distributed among M species. Particles are started out randomly distributed in space, corresponding to a homogeneous state.
The equations of motion used in our simulations are where i ∈ {1, 2, …, N}. Here, S(i) gives the species index corresponding to the particle index i, the velocities are calculated using Eq. (18), D p is the diffusion coefficient of the particles, and ξ i corresponds to a Gaussian white noise with zero mean and unit variance acting on particle i. The equations of motion are integrated using a forward Euler scheme. At every integration step, an overlap correction is then performed to account for hard-core repulsion between the spheres, using the "elastic collision method" 70 . We simulate the system in a three dimensional box of side length L with periodic boundary conditions, and interactions are treated according to the minimum image convention. Note that we do not use Ewald summation in our numerical simulations, which would be relevant if our goal was to simulate system sizes considerably larger than currently considered in our study.
The particle diameter, σ, which is taken to be the same for all species, sets the basic length scale of the simulation. We can define reference activity and mobility scales, respectively α 0 and μ 0 , from which we build a velocity scale V 0 = α 0 μ 0 /(4πDσ 2 ). From these scales, we can define dimensionless time τ = tV 0 /σ, activityα = α=α 0 , and mobilityμ = μ=μ 0 scales. Finally, we define a reduced particle diffusion coefficientD p = D p =ðV 0 σÞ, which serves as an effective noise intensity or temperature.

Simulation parameters
All simulations have been performed with a box size L/σ chosen such that the total volume fraction of the particles is ϕ = 0.005, as well as the choice of activityα = 1, and noise strengthD p = 0:02. Simulations of respectively M = 5 (M = 6) species are performed with N/M = 333 (N/M = 400) particles per species. We use the following rule of thumb for parameter choices: the products in the form ofαμ are chosen to be of order unity, while the time step is chosen such that dτ≤0.001. The total simulation times are usually of the order of τ tot ≈ 10 2 − 10 3 . In the Supplementary Information, we describe each simulation movie. For the specific parameters used in each simulation, see Table 1.

Cluster growth law determination
In order to qualitatively determine the cluster growth law for both an even (see Fig. 3a of the main text and Supplementary Movie 7) and an odd (see Fig. 3b of the main text and Supplementary Movie 3) number of species, we use a simple clustering algorithm implemented in the Julia programming language: starting from each individual particle regarded as a cluster of size 1, we assign two particles to be in the same cluster if their distance is below a threshold, which we choose to be 1.1 times the particle diameter σ. In order to speed up the calculations, we additionally use a link-list algorithm with a cell size 1.1σ, which only requires calculation of the distance of each particle to the particles in its vicinity. We run 100 simulations using the parameters given in the previous subsection for Supplementary Movies 3 and 7, with a longer total time τ tot = 400. We then perform an ensemble average on these data to obtain the growth laws shown in Fig. 1e.

Data availability
The data supporting the main findings of this study are available in the paper and its Supplementary Information. Any additional data can be made available upon request.

Code availability
The algorithms for the codes supporting the main findings of this study are available in the paper and its Supplementary Information. Any additional information concerning the code can be made available upon request.